(Almost) Automatic Conversion of the Venice Italian Treebank into the Merged Italian Dependency Treebank Format

نویسندگان

  • Linda Alfieri
  • Fabio Tamburini
چکیده

English. This paper describes the automatic procedure we developed to convert an Italian dependency treebank into a different format. We defined about 4,250 formal rules for rewriting dependencies and token tags as well as an algorithm for treebank rewriting able to avoid rule interference. At the end of this process a large portion of the whole treebank was automatically converted, with very few errors, leaving only a small amount of work to be done manually. Italiano. Questo contributo descrive la procedura automatica sviluppata per convertire un treebank italiano in un formato diverso. Abbiamo definito circa 4.250 regole formali di riscrittura per le strutture a dipendenza e i tag dei token e un algoritmo per la conversione del treebank in grado di evitare l’interferenza tra le regole. Al termine del processo una consistente sezione dell’intero treebank è stata automaticamente convertita, con un numero ridotto di errori, lasciando solo una piccola quantità di lavoro da svolgersi manualmente.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائۀ راهکاری قاعده‌مند جهت تبدیل خودکار درخت تجزیۀ نحوی وابستگی به درخت تجزیۀ نحوی ساخت‌سازه‌ای برای زبان فارسی

In this paper, an automatic method in converting a dependency parse tree into an equivalent phrase structure one, is introduced for the Persian language. In first step, a rule-based algorithm was designed. Then, Persian specific dependency-to-phrase structure conversion rules merged to the algorithm. Subsequently, the Persian dependency treebank with about 30,000 sentences was used as an input ...

متن کامل

Enriching the Venice Italian Treebank with Dependency and Grammatical Relations

Abstract In this paper we propose a rule-based approach to extract dependency and grammatical relations from the Venice Italian Treebank (VIT) (Delmonte et al., 2007) with bracketed tree structure. To our knowledge, the only dependency annotated corpus for Italian available is the Turin University Treebank (Lesmo et al., 2002), which has 25,000 tokens and is about 1/10 of VIT. As manual corpus ...

متن کامل

Evalita’09 Parsing Task: constituency parsers and the Penn format for Italian

The aim of Evalita Parsing Task is at defining and extending the state of the art for parsing Italian by encouraging the application of existing models and approaches. Therefore, as in the first edition, the Task includes two tracks, i.e. dependency and constituency. This second track is based on a development set in a format, which is an adaptation for Italian of the Penn Treebank format, and ...

متن کامل

Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank

The paper addresses the challenge of converting MIDT, an existing dependency– based Italian treebank resulting from the harmonization and merging of smaller resources, into the Stanford Dependencies annotation formalism, with the final aim of constructing a standard–compliant resource for the Italian language. Achieved results include a methodology for converting treebank annotations belonging ...

متن کامل

Comparing Italian parsers on a common Treebank: the EVALITA experience

The Evalita ’07 Parsing Task has been the first contest among parsing systems for Italian. It is the first attempt to compare the approaches and the results of the existing parsing systems specific for this language using a common treebank annotated using both a dependency and a constituency-based format. The development data set for this parsing competition was taken from the Turin University ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016